code completion
- North America > United States (0.04)
- North America > Dominican Republic (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Asia > Middle East > Jordan (0.04)
Beyond Function-Level Search: Repository-Aware Dual-Encoder Code Retrieval with Adversarial Verification
Liu, Aofan, Song, Shiyuan, Li, Haoxuan, Yang, Cehao, Qi, Yiyan
The escalating complexity of modern codebases has intensified the need for retrieval systems capable of interpreting cross-component change intents, a capability fundamentally absent in conventional function-level search paradigms. While recent studies have improved the alignment between natural language queries and code snippets, retrieving contextually relevant code for specific change requests remains largely underexplored. To address this gap, we introduce RepoAlign-Bench, the first benchmark specifically designed to evaluate repository-level code retrieval under change request driven scenarios, encompassing 52k annotated instances. This benchmark shifts the retrieval paradigm from function-centric matching to holistic repository-level reasoning. Furthermore, we propose ReflectCode, an adversarial reflection augmented dual-tower architecture featuring disentangled code_encoder and doc_encoder components. ReflectCode dynamically integrates syntactic patterns, function dependencies, and semantic expansion intents through large language model guided reflection. Comprehensive experiments demonstrate that ReflectCode achieves 12.2% improvement in Top-5 Accuracy and 7.1% in Recall over state-of-the-art baselines, establishing a new direction for context-aware code retrieval.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Practical Code RAG at Scale: Task-Aware Retrieval Design Choices under Compute Budgets
Galimzyanov, Timur, Kolomyttseva, Olga, Bogomolov, Egor
We study retrieval design for code-focused generation tasks under realistic compute budgets. Using two complementary tasks from Long Code Arena -- code completion and bug localization -- we systematically compare retrieval configurations across various context window sizes along three axes: (i) chunking strategy, (ii) similarity scoring, and (iii) splitting granularity. (1) For PL-PL, sparse BM25 with word-level splitting is the most effective and practical, significantly outperforming dense alternatives while being an order of magnitude faster. (2) For NL-PL, proprietary dense encoders (Voyager-3 family) consistently beat sparse retrievers, however requiring 100x larger latency. (3) Optimal chunk size scales with available context: 32-64 line chunks work best at small budgets, and whole-file retrieval becomes competitive at 16000 tokens. (4) Simple line-based chunking matches syntax-aware splitting across budgets. (5) Retrieval latency varies by up to 200x across configurations; BPE-based splitting is needlessly slow, and BM25 + word splitting offers the best quality-latency trade-off. Thus, we provide evidence-based recommendations for implementing effective code-oriented RAG systems based on task requirements, model constraints, and computational efficiency.
- Asia > Singapore (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (5 more...)
LLavaCode: Compressed Code Representations for Retrieval-Augmented Code Generation
Cherniuk, Daria, Sukhorukov, Nikita, Sushko, Nikita, Gusak, Daniil, Sivtsov, Danil, Tutubalina, Elena, Frolov, Evgeny
Retrieval-augmented generation has emerged as one of the most effective approaches for code completion, particularly when context from a surrounding repository is essential. However, incorporating context significantly extends sequence length, leading to slower inference - a critical limitation for interactive settings such as IDEs. In this work, we introduce LlavaCode, a framework that compresses code into compact, semantically rich representations interpretable by code LLM, enhancing generation quality while reducing the retrieved context to only a few compressed single-token vectors. Using a small projector module we can significantly increase the EM and ES metrics of coding model with negligible latency increase. Our experiments demonstrate that compressed context enables 20-38% reduction in Time-to-First-Token (TTFT) on line completion tasks compared to full-RAG pipelines.
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.05)
- Asia > Russia (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Ukraine > Kyiv Oblast > Kyiv (0.04)
SpareCodeSearch: Searching for Code Context When You Have No Spare GPU
--Retrieval-Augmented Generation (RAG) frameworks aim to enhance Code Language Models (CLMs) by including another module for retrieving relevant context to construct the input prompt. However, these retrieval modules commonly use semantic search, requiring substantial computational resources for training and hosting these embedded models, making them infeasible to integrate into lightweight applications such as in-IDE AI-based code completion. In this solution paper, we prove that using keyword-search is sufficient to retrieve relevant and useful code context inside large codebases, without the need for extensive GPU resources. The usefulness of code contexts found by our solution is demonstrated through their completion results on the Code Context Competition's benchmark, reaching 0.748 and 0.725 chRF scores on Kotlin and Python tracks, respectively. Code Language Models (CLMs) have shown great promise in generating code, given the right contexts in their input prompts [1], [2].
- North America > United States (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.49)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Dominican Republic (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Mellum: Production-Grade in-IDE Contextual Code Completion with Multi-File Project Understanding
Pavlichenko, Nikita, Nazarov, Iurii, Dolgov, Ivan, Garanina, Ekaterina, Ustalov, Dmitry, Bondyrev, Ivan, Lysaniuk, Kseniia, Vu, Evgeniia, Chekmenev, Kirill, Shtok, Joseph, Golubev, Yaroslav, Semenkin, Anton, Sazanovich, Uladzislau
We present the Mellum models family, open-weight code completion models designed for interactive use in JetBrains IDEs. Mellums have 4B parameters, adopt a Llama-style architecture, and are pre-trained on ~4T tokens of permissively licensed, multi-language code. Our studies show that (i) careful data curation and staged training significantly improve the model's quality, (ii) editor-critical capabilities such as context packing are necessary for high-quality suggestions, and (iii) a compact, task-focused model can meet the cost and latency constraints of interactive completion. In the paper, we describe an end-to-end industrial pipeline for producing contextualized in-editor completion: disciplined data governance, multi-stage training that includes fill-in-the-middle and project context via supervised fine-tuning, and alignment via direct preference optimization using feedback from real-world scenarios. Our quality evaluations include both large-scale offline benchmarks and online telemetry from production deployments in JetBrains IDEs. Mellums are released under the Apache-2.0 license on HuggingFace, with a public model card providing a reproducible reference for practitioners. Our experience offers a pragmatic blueprint for taking a focused, open model from a research prototype to at scale production for hundreds of thousands of users.
- North America > United States > District of Columbia > Washington (0.05)
- Europe > Serbia > Central Serbia > Belgrade (0.04)
- Europe > Germany > Berlin (0.04)
- (6 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
- Information Technology > Data Science > Data Quality (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Smart Paste: Automatically Fixing Copy/Paste for Google Developers
Nguyen, Vincent, Herzog, Guilherme, Cambronero, José, Revaj, Marcus, Kini, Aditya, Frömmgen, Alexander, Tabachnyk, Maxim
Manually editing pasted code is a long-standing developer pain point. In internal software development at Google, we observe that code is pasted 4 times more often than it is manually typed. These paste actions frequently require follow-up edits, ranging from simple reformatting and renaming to more complex style adjustments and cross-language translations. Prior work has shown deep learning can be used to predict these edits. In this work, we show how to iteratively develop and scale Smart Paste, an IDE feature for post-paste edit suggestions, to Google's development environment. This experience can serve as a guide for AI practitioners on a holistic approach to feature development, covering user experience, system integration, and model capabilities. Since deployment, Smart Paste has had overwhelmingly positive feedback with a 45% acceptance rate. At Google's enterprise scale, these accepted suggestions account substantially for over 1% of all code written company-wide.
LongCodeZip: Compress Long Context for Code Language Models
Shi, Yuling, Qian, Yichun, Zhang, Hongyu, Shen, Beijun, Gu, Xiaodong
Code generation under long contexts is becoming increasingly critical as Large Language Models (LLMs) are required to reason over extensive information in the codebase. While recent advances enable code LLMs to process long inputs, high API costs and generation latency remain substantial bottlenecks. Existing context pruning techniques, such as LLMLingua, achieve promising results for general text but overlook code-specific structures and dependencies, leading to suboptimal performance in programming tasks. In this paper, we propose LongCodeZip, a novel plug-and-play code compression framework designed specifically for code LLMs. LongCodeZip employs a dual-stage strategy: (1) coarse-grained compression, which identifies and ranks function-level chunks using conditional perplexity with respect to the instruction, retaining only the most relevant functions; and (2) fine-grained compression, which segments retained functions into blocks based on perplexity and selects an optimal subset under an adaptive token budget to maximize relevance. Evaluations across multiple tasks, including code completion, summarization, and question answering, show that LongCodeZip consistently outperforms baseline methods, achieving up to a 5.6x compression ratio without degrading task performance. By effectively reducing context size while preserving essential information, LongCodeZip enables LLMs to better scale to real-world, large-scale code scenarios, advancing the efficiency and capability of code intelligence applications.